Skip to content

Check if connection is valid while get client from poll #1957

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

xizheyin
Copy link
Contributor

May Fixes #1887

It is possible that the client connections in the pool have not been used for a long time, and the server side automatically closes the connection with the client, however, is_closed only checks the status of the local client and not whether the server side has a connection. I added the check to ensure the robustness.

Alternatively, there are other ways to ensure a more continuous connection.

@Urgau
Copy link
Member

Urgau commented May 19, 2025

I know much to nothing about our database connections, maybe @ehuss or @Kobzol knows about it, and if this conceptually a good idea.

@ehuss
Copy link
Contributor

ehuss commented May 19, 2025

Just quickly looking at this, it seems like it might have a few issues.

  • It doesn't look like it manages the permit. That means the permit count and the pool count could get out of sync, allowing more connections than the semaphore is intended to allow.
  • This doesn't directly address Scheduled jobs often fail due to db timeout #1887, which is more of a theory about idle connections being terminated, and not knowing that until it has been popped off the queue.
  • If the connection is broken, checking it in get() isn't going to help, since the call to validate_connection could hang, which would be no different from just running a normal query.

When I have more time, I can maybe look at some suggestions. Or maybe @Kobzol has more time right now.

@Kobzol
Copy link
Contributor

Kobzol commented May 20, 2025

Tbh I don't think that this is the right approach, as we aren't even sure what was the cause of the original issue. These timeouts also happen in rustc-perf, and it's IMO caused simply by the DB not responding due to being overloaded or a perhaps a network issue. The DB server has been beefied up recently, so maybe these issues aren't even happening anymore? It seems like they stopped happening on rustc-perf once the server was upgraded. So I would keep the connection pooling like it is for now.

@xizheyin
Copy link
Contributor Author

Ok, because I have encountered similar problems before, but there may be various reasons for the original issue. Thank you three for review! I'll close it.

@xizheyin xizheyin closed this May 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Scheduled jobs often fail due to db timeout
4 participants